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binding protein (9). In this paper, we describe the DNA 
sequence of the exonuclease gene and of the neighbouring genes. 

The exonuclease was first detected as an increased alkaline 
nuclease activity present in HSV infected cells (10) and was 
subsequently purified and shown to be a single polypeptide chain 
(7 , 11 , 12 , 13 ) j possessing both 5' and 3' exonuclease activity and 
also an endonuclease activity (11,14,15). Analysis of 
temperature sensitive mutants of the HSV-2 exonuciease gene has 
shown that the enzyme is necessary for efficient replication of 
virus DNA, although its precise role has not been defined 
(16,17,18). The exonuclease gene was located in the long unique 
region (U L : see Figure 1) of HSV-1 DNA, near 0.17 map units, by 
hybrid arrest of translation of active enzyme (19). Analysis of 
the transcript organization of this region of the genome has 
shown that exonuclease mRNA is a member of a 3 1 coterrainal set 
of mRNAS (20). This paper reports a 7800 bp sequence containing 
the whole of the 3' coterminal family, for HSV-1 strain 17, 
together with all or part of the two flanking genes. 

MATERIALS AND METHODS 

( 1 ) Plasmids . 

Cloned restriction nuclease fragments of HSV-1 strain 17 DNA 
were used for sequence analysis, as follows: EcoRl 3 and EcoRI d 
in the EcoRI site of pACYC184, from V.G. Preston; and Kpnl t and 
Kpnl 3 cloned into the PstI site of pAT153 by dG/dC tailing, 
from A.J. Davison. 

( 2 ) Sequence analysis and interpretation . 

DNA sequences were determined by Ml3/dideoxy methods, as 
described (9,21,22). Computing was performed with a DEC PDP 
11/44 under RSX11M, with programs as previously described 
(9,22). 

{ 3 ) Description of the HSV-1 genome . 

HSV genetic and mRNA mapping data commonly specify genome 
locations in terms of fractional map units, running from 0.000 
to 1.000. There exist significant discrepancies between the 
numbering for HSV-1 strain 17 used by us and, for instance, 
strain KOS (20,23); these are of the order of 0.005 map units or 
800 bp. we deal with this, for the present, by quoting map 
units as approximations to two decimal places only. 




7.8 kb 



a- -r 



30b 



9»c 



■ r 



conventional repreeentat i™ o? he HsTl ^ ' h ° W3 3 

repeat elements as open boxes an* ?? } 9 enorae < Wlt h major 
sequences ( U L and U„) m^Hh f 6 l0 " g and 3hort unique 

fractional geW JlngtS ^ un ^'m^ indicated 
figure an expansion is given for I 6 lower P art of th e 

units, with numbering in kb corLf ^ 91 ° n °' 16 t0 °- 20 m *P 
Locations and orientation, f> r f « s P°nding to Figure 2. 

(20, 23.24), are shown a ° arr ° ws 

that transcript b^s nJpotheUcaT tt 9 ^ *? ° Pen b ° XeS • Note 
2< only the first exon^nd part of S T ' ' F ° r tran «rip t 
included (23). P o£ the ln tron region are 



RESULTS 

(1) Organization of ths ucv i 

E wann^ ~ 3 .. c ' "--lb to 0.20 map uni ts. 

n::::ii :r ns ;; ipt -* p ba - d - — - - "~ 
u.« 1/ :^;:r::rz:r;,r:/ u - part of - 

paper we hav* rf a e< t . ° m the «W>n. For this 

l l f fi a :;: d t r e v: 12 (Pi9ure n - *• * *• 

two lowest of th e ! COte ™ inal ™e 

9 ot these, e and f, were described as late 
transcripts of 3.9 and 4.5 kb (yo\ „ • 

transcript expressina tt ~ ~ rly 2 ' 3 kb 

expressing the exonucleas* Mm 
species (20) b is » „ Please (20) . £ is a late 1.9 kb 

VT :~ ;i:„— 

late, rightward transcribed species (24). To 



Nucleic Acids Research 



the right of the group, RNA J is a late, rightward 2.7 kb 
transcript, with a large intron (23). Only the first exon is in 
the region treated here. Within the intron region are 3* 
coterminal, leftward transcripts (23); the downstream termini of 
these are shown genericaily as h in Figure 1. 

Organization in 3' coterminal families is a common feature 
of HSV gene arrangement (22,24). It is thought that each RNA is 
translated to give the polypeptide encoded adjacent to the 5' 
terminus, while distal reading frames remain unused. Figure 1 
.also shows for mRNA species a to 3 the locations of the 
proposed, corresponding protein coding regions, as deduced from 
our DNA sequence data. These. open reading frames were evaluated 
using published mRNA mapping data (20,23,24), by analysis of 
codon usage compared with known HSV genes (25), and by 
comparisons with corresponding sequences from the genomes of 
varicella-zoster virus (VZV) and Epstein-Barr virus (EBV). VZV 
is, like HSV, a member of the alphaherpes vi r inae sub-family 
(26), although its DNA sequence and many details of genome 
organization differ substantially from those of HSV. The 
unpublished sequence of the corresponding region of the VZV 
genome was made available to us by our colleague A.J. Davison. 
EBV is a member of the gammaher pes vi r inae , and its complete 
genome sequence has been published (26,27). 

We have determined the D\ T A sequence of HSV-1 strain 17 for 
the region shown in Figure 1. This sequence is presented in 
Figure 2 as 7800 bp, of base composition 65.2% G+C. Proposed 
encoded amino acid sequences are also shown. The sequence was 
determined by the M13/dideoxy system with random sub-fragments 
of four large, plasmid cloned fragments of HSV-1 DNA (Kpnl _f and 
3, and EcoRI d and 3). Kpnl £ and 3 lie to the left and right, 
respectively, of the Kpnl site at residue 5958 in Figure 2, and 
EcoRI d and 3 lie to the left and right, respectively, of the 
EcoRI site at 6670. The sequence as presented starts at an 
arbitrary point to the left of the mRNA a region and ends 
downstream of the mRNA 3 splice donor site. In the following 
sections each gene is treated in turn. 
( 2 ) Gene "a" encodes a hydrophobic protein . 

A rightward transcribed 2 kb mRNA has been mapped to the 
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left of the oconuclease gene region (24) . We consider that the 
3' terminus of this RNA is near residue 2002 of Figure 2, 
downstream of the appropriately placed polyadeny lat ion consensus 
AATAAA at 1977. The 5' terminus should therefore be near the 
start of the sequence presented here, but has not been mapped 
precisely. We propose that the protein coding region of gene a 
starts with ATG at residue 538 and closes with TAG at 1957. 
This reading frame of 473 codons encodes a protein of M r 51389, 
now termed 5 IK. 

No known protein or function has been assigned clearly to 
this gene. The encoded protein contains a high content of 
hydrophobic amino acids (the three most abundant amino acid 
species are Ala, Val and Leu), and also an excess of basic over 
acidic residues. The hydrophobic residues are notably 
clustered. There are at least five regions which, from their 
degree of hyd rophob ic i ty and absence of charged residues, could 
span a lipid bilayer membrane. We think it possible that 51K is 
a previously undescribed, integral membrane protein, similar to 
that encoded by HSV-1 near 0.74 map units which is involved in 
virus-induced cell fusion (28). The 51K amino acid sequence 
possesses low but definite homology with that of the EBV reading 
frame BBRF3 (27) (see section 8, below). 

( 3 ) A possible small gene at the 3' end of the coterminal 
f ami ly . 

As described in the next section, mRNAs c, d, e and t have 
their 3' termini near residue 2115. Upstream of this (on the 
leftward 5'- 3' strand) lies the exonuclease coding region, 
which terminates at TGA, residue 2343. There also exists a 
small open reading frame overlapping the downstream end of the 
exonuclease reading frame, out of phase, from ATG at 2425 to TAA 
at 2137. This would encode a protein of 96 amino acids, M r 
10486, now called 10K. There is no mapped mRNA corresponding to 
this reading frame. However, we think this small gene is 
probably real, for the following reasons. Both EBV and VZV 
possess corresponding open reading frames, with amino acid 
sequence homology to the HSV-1 candidate. For the EBV example 
(reading frame BBLF1), a promoter has been identified (27). 
Lastly, the 10K reading frame shows a reasonable codon usage. A 
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Figure 2. DNA sequence of the HSV-1 genome, 0.16 to 0.2-0 map 
units. The DNA sequence is shown for the expanded region of 
Figure 1, as the rightward 5 - -3' strand onLy. precisely mapped 

5' termini of mRNAs are shown as O > (20,23), and presumed 

3* termini as :. Candidate polyadeny lat ion sequences 

AATAAA are underlined. Predicted amino acid sequences are 
given in single letter code, with those from rightward 5'-3' 
transcripts above the DNA sequence, and from leftward 
transcripts below . 
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hypothetical mRNA species (b) for this protein is indicated in 

Figure 1 . 

(4) The exonuclease gene . 

Costa et a i. ( 20 ) have determined a 1000 bp sequence for 
HSV-1 strain KOS, corresponding to residues 3656 - 4654 of 
Figure 2 . An extra re3idue ia presenfc at fi . n ^ ^ 

sequence (presumed an error). and there are 10 base 
substitutions. Within this sequence Costa et al. m apped the 5 . 
ternunx of the mRNAs here called c and d. mRNA d encodes the 
exonuclease and has its 5" terminus at position 4380 of Fi gure 2 
(20). From the estimated size of RNA d (2.3 kb) it i s clear 
that the common 3' termini of this mRNA fa,nUy lie near residue 
2115. downstream of the polyadenylation consensus AATAAA at 2134 
on the leftward 5 '-3' strand. 

We consider the exonuclease coding region runs from ATG at 
4221 to TOA at 2343. This gives a protein of 626 amino acids. 
Mr 67503, which corresponds moderately with a recent estimate of 
the M r from gel electrophoresis, of 85000 (1 3 ). The origin and 

--ct.cn of m RNA £ . whose 5" terminus was mapped to posiLn 
3971 (20), are less clear. There is no obvious TATA box 
upstream of KNA cj s 5 . ter.n,. Oownstream of the s- terminus, 
he . lrtt two potential initiator ATG codona ^ ^ 

174 nucleotides later. at 3669; both of these are in the 
exonuclease assigned reading f rarae . lt ig therefore 
translate of rna c would give rise to an N-terminally 
truncated exonuclease protein, starting from the first ATG, at 

' ^ "-nation product would contain 500 amino acids, 
with M r 54395. 

w e have found that the . xon „ clMM ac . d sequence , a 

=l«.r y r.ut.d to EBV r. a<i i„ g £ca « c BGLF5 ( 2 7) „ e e sectic 8. 

°t .>on„cl„ s . : that „, t „. EBV ,eq u .„c« I, to „ 9hly 

could be a functional protein. 
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I» a limited sequence analysis, Costa et al . (23J 
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the 5' terminus of mRNA e at a position corresponding to residue 
6025 (Figure 2), on the Leftward S 1 ^* strand. Curiously, this 
is just upstream of the best nearby TATA box candidate, at 6013 
to 6020. We think that translation of this mRNA starts with the 
first ATG, at 5836, and ends with TGA at 4282. This would give 
a protein of 518 amino acids, M r 57193, now called 57K. This 
assignment places the promoter, 5' terminus and part of the 5' 
non coding region of the next downstream transcript, RNA d, 
within the protein coding region of RNA e. This arrangement has 
precedents in a number of other HSV genes (29,30). The 57K 
amino sequence is homologous to that of the EBV reading frame 
BGLF4. 

Costa et al. (20) tentatively identified either RNA e or RNA 
t as encoding a nucleocapsid protein designated VP19C. However, 
they apparently confused VP19C with another species, VP18.8 
(31,32), and we conclude that there is presently no firm 
evidence as to the identity of 57K or of the RNA f_ product, 23K, 
described next. 

(6 ) The 5' member of the coterminal RNA family . 

The largest member of the leftward reading 3* coterminal 
family, RNA f^, has been reported to be a 4.5 kb transcript (20). 
The 5' terminus of RNA t has not been mapped precisely, but we 
think it probable that it lies near 6470 or near 6590, 
downstream of TATA candidate sequences at 6484 and 6604. The 
most likely protein coding region runs from ATG at 6237 to TGA 
at 5592, encoding a protein of 215 amino acids, M r 23454, which 
is now termed 23K. This assignment of RNA f 's coding region 
thus includes the promoter and 5' terminal region of RNA e, and 
82 codons (out of phase) of the 57K coding sequence. 
Arrangement of coding regions is proposed to be similar in the 
corresponding region of the VZV genome (A.J. Davison, personal 
communication). However, our interpretation should be regarded 
as tentative, until these coding regions can be verified 
experimentally. 

(7 ) First exon of spliced RNA "q" . 

A major recent surprise in analysis of HSV's gene 
organization was the description of a gene with a 4 kb intron, 
within which were located genes in the opposite orientation 
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The end result of oar analyses is a clear view of gene 
organization in this portion of HSV-1 DNA; this has been greatly 
facilitated by the extensive mRNA mapping data of Costa et al. 
(20,23). Some qualifications remain, and these will only be 
resolved by extensive, further studies. We see the most 
immediate of these as being, first, an evaluation of the 
N-terminally truncated protein thought to be translated from RNA 
c, and second, direct analysis of our proposed RNA b. 

Our comparisons with EBV showed that the part of the HSV-1 
genome examined shows general colinearity of organization with a 
region of the EBV genome. Such colinearity also extends 
rightward. at least in that the two genomes possess 
correspondingly placed and homologous second exon sequences for 
the rightward, spliced RNA 2 (23). However, Figure 4(b) shows 
that larger scale rearrangements have certainly occurred during 
the divergent evolution of these viruses. 

This sequence will be deposited with the EMBL Sequence 
Library . 
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in opposite direction from the complementary DNA strand is 
located at nt 2998. Cloning of specific, complex spliced mKNAs 
transcribed from early region E4 using PCR and analysis of 
specific gene products are under investigation. 
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